70 research outputs found
Vizuális kategóriák tanulása = Learning visual categories
A jelenlegi kutatás elsődleges célja vizuális információt reprezentáló modellek kidolgozása volt integrálva a képi objektumok strukturális és megjelenési jellemzőit. A kutatásunk során az objektumok megjelenésének és struktúrájának egy- és több-nézeti modellezésével, különböző vizuális jellemzők egységes modellbe való integrálásával, statisztikai tanulóalgoritmusok alkalmazásával valamint objektumok kategorizálásával foglalkoztunk. A kidolgozott kategorizáló eljárásokat járműtípusok felismerésére valamint arcképek nemek és érzelmek alapján történő osztályozására alkalmaztuk. Az elért eredmények alapján kijelenthető, hogy ezen jellemzők integrálásával jelentősen javítható a klasszikus képi kategerizáló és felismerő algoritmusok hatékonysága. | The primary goal of present work was to develop methods for the representation of visual information that integrates appearance and structure visual cues. During our research we dealt with modelling objects' appearance and structure from single and multiple views, integrating different visual cues into single models, applying statistical learning algorithms and with object categorization. The developed methods were applied to categorization of cars by type, faces by gender and emotion. The obtained results demonstrate that this kind of integration of visual cues increases the performance of classic visual information categorization and recognition methods
Video camera registration using accumulated co-motion maps
The paper presents a method to register partially overlapping camera-views of scenes where the objects of interest are in motion even if unstructured environment and motion. In a typical outdoor multi-camera system the observed objects might be very different due to the changes in lighting conditions and different camera positions. Hence, static features such as color, shape, and
contours cannot be used for camera registration in these cases. Calculation of co-motion statistics, which is followed by outlier rejection and a nonlinear optimization, does the matching. The described robust algorithm finds point correspondences in two camera views (images) without searching for any objects and without tracking any continuous motion. Real-life outdoor experiments demonstrate the feasibility of our approac
Towards Contrastive Learning in Music Video Domain
Contrastive learning is a powerful way of learning multimodal representations
across various domains such as image-caption retrieval and audio-visual
representation learning. In this work, we investigate if these findings
generalize to the domain of music videos. Specifically, we create a dual
en-coder for the audio and video modalities and train it using a bidirectional
contrastive loss. For the experiments, we use an industry dataset containing
550 000 music videos as well as the public Million Song Dataset, and evaluate
the quality of learned representations on the downstream tasks of music tagging
and genre classification. Our results indicate that pre-trained networks
without contrastive fine-tuning outperform our contrastive learning approach
when evaluated on both tasks. To gain a better understanding of the reasons
contrastive learning was not successful for music videos, we perform a
qualitative analysis of the learned representations, revealing why contrastive
learning might have difficulties uniting embeddings from two modalities. Based
on these findings, we outline possible directions for future work. To
facilitate the reproducibility of our results, we share our code and the
pre-trained model.Comment: 6 pages, 2 figures, 2 table
Higher order symmetry for non-linear classification of human walk detection
The paper focuses on motion-based information extraction from cluttered video image-sequences. A novel method is introduced which can reliably detect walking human figures contained in such images. The method works with spatio-temporal input information to detect and classify patterns typical of human movement. Our algorithm consists of real-time operations, which is an important factor in practical applications. The paper presents a new information-extraction and temporal-tracking method based on a simplified version of the symmetry pattern extraction, which pattern is characteristic for the moving legs of a walking person. These spatio-temporal traces are labelled by kernel Fisher discriminant analysis. With the use of temporal tracking and non-linear classification we have achieved pedestrian detection from cluttered image scenes with a correct classification rate of 97.6% from 1-2 step periods. The detection rates of linear classifier and SVM are also presented in the results hereby the necessity of a nonlinear method and the power of KFDA for this detection task is also demonstrated
Behavior and event detection for annotation and surveillance
Visual surveillance and activity analysis is an active research
field of computer vision. As a result, there are several
different algorithms produced for this purpose. To obtain
more robust systems it is desirable to integrate the different algorithms. To achieve this goal, the paper presents results in automatic event detection in surveillance videos, and a distributed application framework for supporting these methods. Results in motion analysis for static and moving cameras, automatic fight detection, shadow segmentation, discovery of unusual motion patterns, indexing and retrieval will be presented. These applications perform real time, and are suitable for real life applications
- …